On the creation of a pronunciation dictionary for Hungarian
نویسنده
چکیده
Recent research on the phonological structure of the mental lexicon has almost exclusively been based on the English mental lexicon. Linguists and psychologists have been especially interested in identifying what constitutes a phonological neighborhood and how a phonological neighborhood is influenced by word frequency (cf. String edit distance is typically used as a measure of phonological similarity, but new measurements are being proposed (cf. Kapatsinski, in press). However, because research attempting to connect properties of the phonological lexicon to data from language acquisition, speech errors, and word similarity judgments has not adequately addressed how results may diverge in unrelated languages, it is not clear whether the conclusions drawn for English can be generalized. Hence this presentation addresses the development of an alternative resource for the Hungarian language, an agglutinative language with several unique typological properties. Due to the high amount of inflectional and derivational morphology in Hungarian, we expect sound similarity to be more heavily influenced by morphology in Hungarian than in English. Additionally, because Hungarian words are significantly longer than English words, new definitions for what constitutes a phonological neighborhood may also need to be defined. The pronunciation dictionary of Hungarian under consideration here is based on the Hoosier Mental Lexicon developed in the Psychology Department at Indiana University (Nusbaum et al., 1984). The target is to have a text file with columns representing orthography, pronunciation, and corpus frequency for each word (the Hoosier Mental Le xicon additionally has data on word familiarity ratings). The initial input was a word list of orthographic Hungarian developed at the Research Institute for Linguistics in Budapest during the 1980's (Kornai, 1986). In creating a pronunciation dictionary, there were several phonological, morphological, and historical factors to consider. Standards for spelling in modern Hungarian (called helyesírás) were developed and standardized in the late 19 th and early 20 th centuries (Benkõ and Imre, 1972), and as a result the output of many morphophonological processes are reflected in the orthography. In fact, Hungarian linguists are constantly reminding native Hungarian speakers that the Hungarian alphabet is in fact not phonetic. In this research, several sources were used to determine standards for the Budapest dialect Deviations of pronunciation from orthography that remained to be accounted for were historical spelling variants in (1), segment degemination in superheavy syllables depending on sonority sequencing principles in (2), consonant cluster voicing assimilation in (3), variable high vowel lengthening in (4), final …
منابع مشابه
On the creation of a pronunciation dictionary for Hungarian
This report describes the process of creating a pronunciation dictionary and phonological lexicon for Hungarian for the purpose of aiding in linguistic research on Hungarian phonology and phonotactics. The pronunciation dictionary was created by transforming orthographic forms to pronunciation representations by taking advantage of systematic deviations between Hungarian orthography and pronunc...
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملThe Pronouncing Dictionary of Austrian German and the other Major Varieties of German - A Phonetic Resources Database on the Pronunciation of German
The paper gives a comprehensive overview on the project “Varieties of Austrian German Standard pronunciation and varieties of standard pronunciation” whose primary goal is the creation of a pronouncing dictionary of Austrian German and the creation of a large data base of audio samples for research on spoken language and different forms of pronunciation in Austria. The contents of the dictionar...
متن کاملEffort and Accuracy during Language Resource Generation: A Pronunciation Prediction Case Study
When developing a language resource, there is generally a trade-off between the amount of effort invested in the resource creation process and the quality of the resulting resource. We argue that, in the developing world with its many resource-scarce languages, a ‘usable’ resource in multiple languages may be more valuable than a highly accurate resource for one language only. From this perspec...
متن کاملSentiment Analysis of Social Networking Data Using Categorized Dictionary
Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed. A categorized dictiona...
متن کامل